Optimising the Volgenant-Jonker algorithm for approximating graph edit distance
نویسندگان
چکیده
Although it is agreed that the Volgenant-Jonker (VJ) algorithm provides a fast way to approximate graph edit distance (GED), until now nobody has reported how the VJ algorithm can be tuned for this task. To this end, we revisit VJ and propose a series of refinements that improve both the speed and memory footprint without sacrificing accuracy in the GED approximation. We quantify the effectiveness of these optimisations by measuring distortion between control-flow graphs: a problem that arises in malware matching. We also document an unexpected behavioural property of VJ in which the time required to find shortest paths to unassigned vertices decreases as graph size increases, and explain how this phenomenon relates to the birthday paradox. c © 2016 Elsevier Ltd. All rights reserved.
منابع مشابه
Revisiting Volgenant-Jonker for Approximating Graph Edit Distance
Although it is agreed that the Volgenant-Jonker (VJ) algorithm provides a fast way to approximate graph edit distance (GED), until now nobody has reported how the VJ algorithm can be tuned for this task. To this end, we revisit VJ and propose a series of refinements that improve both the speed and memory footprint without sacrificing accuracy in the GED approximation. We quantify the effectiven...
متن کاملOptimising Tree Edit Distance with Subtrees for Textual Entailment
This paper introduces a method for improving tree edit distance (TED) for textual entailment. We explore two ways of improving TED: we extend the standard TED to use edit operations that apply to subtrees as well as to single nodes; and we use the ‘artificial bee colony’ algorithm (ABC) to estimate the cost of edit operations for single nodes and subtrees and to determine thresholds. The prelim...
متن کاملComparing Stars: On Approximating Graph Edit Distance
Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accor...
متن کاملApproximating Graph Edit Distance Using GNCCP
The graph edit distance (GED) is a flexible and widely used dissimilarity measure between graphs. Computing the GED between two graphs can be performed by solving a quadratic assignment problem (QAP). However, the problem is NP complete hence forbidding the computation of the optimal GED on large graphs. To tackle this drawback, recent heuristics are based on a linear approximation of the initi...
متن کاملApproximating optimal solution structure with edit distance and its applications
An alternative notion of approximation arising in cognitive psychology, bioinformatics and linguistics is that of computing a solution which is structurally close to an optimal one. That is, an approximate solution is considered good if its distance from an optimal solution is small, for a distance measure such as Hamming distance or edit distance. There has been a modicum of work on approximat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 87 شماره
صفحات -
تاریخ انتشار 2017